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Abstract 

Background: Previously, we suggested prototypal models that describe some 
clinical states based on group postulates. Here, we demonstrate a group/category 
theory-like model for molecular/genetic biology as an alternative application of our 
previous model. Specifically, we focus on deoxyribonucleic acid (DNA) base sequences. 

Results: We construct a wallpaper pattern based on a five-letter cruciform motif with 
letters C, A, T, G, and E. Whereas the first four letters represent the standard DNA bases, 
the fifth is introduced for ease in formulating group operations that reproduce 
insertions and deletions of DNA base sequences. A basic group Z 5 = {r, u, d, I, n} 
of operations is defined for the wallpaper pattern, with which a sequence of 
points can be generated corresponding to changes of a base in a DNA sequence 
by following the orbit of a point of the pattern under operations in group Z 5 . 
Other manipulations of DNA sequence can be treated using a vector-like notation 'D/ 
corresponding to a DNA sequence but based on the five-letter base set; also, 'D/s are 
expressed graphically. Insertions and deletions of a series of letters 'E' are admitted to 
assist in describing DNA recombination. Likewise, a vector-like notation Rj can be 
constructed for sequences of ribonucleic acid (RNA). The wallpaper group B = {Z 5 Xo °, •} 
(an oo-fold Cartesian product of Z 5 ) acts on Dj (or Rj) yielding changes to Dj (or Rj) 
denoted by 'Dj°B(j_> k ) = D k ' (or 'Rj°B(j_> k ) = R k '). Based on the operations of this 
group, two types of groups — a modulo 5 linear group and a rotational group over 
the Gaussian plane, acting on the five bases — are linked as parts of the wallpaper 
group for broader applications. As a result, changes, insertions/deletions and DNA 
(RNA) recombination (partial/total conversion) are described. As an exploratory 
study, a notation for the canonical "central dogma" via a category theory-like way 
is presented for future developments. 

Conclusions: Despite the large incompleteness of our methodology, there is 
fertile ground to consider a symmetry model for genetic coding based on our 
specific wallpaper group. A more integrated formulation containing "central 
dogma" for future molecular/genetic biology remains to be explored. 

Keywords: DNA (RNA) bases, Imaginary base, Wallpaper group, Operation, 
Cartesian vector, Category, Central dogma 
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Background 

Group theory is the cornerstone in classifying and studying abstract concepts involving 
symmetry [1,2]. In general, when group theory is used in various fields of natural sciences, 
it plays an important role in describing geometrical or dynamical symmetries of phenom- 
ena under consideration; examples include mathematics [3,4], physics [5-8], chemistry [9], 
molecular/genetic biology [10-22], and anthropology [23]. Moreover, much fertile ground 
still exists where group theory can display its versatility from a multitude of viewpoints. 
To our knowledge, one such candidate is molecular/genetic biology where group theory 
has already provided great contributions [10-22]. 

Deoxyribonucleic acid (DNA) is a nucleic acid containing genetic instructions coded 
in ordered sequences of four bases located in genes that determine specific genetic 
characteristics of an organism. In the canonical Watson-Crick DNA base pairing, aden- 
ine (A) forms a base pair with thymine (T) and guanine (G) forms a base pair with 
cytosine (C) [24-26]. Similarly, ribonucleic acid (RNA), which has various biological 
roles, is a molecule that has a much shorter chain of nucleotides. The sequence of 
DNA consisting of bases A, C, T and G' is transcribed into RNA, composed of bases 
A, C, U and G'; the sets differ in that 'U (uracil)' replaces 'T (thymine)'. 

Over the latter half of the 20th century, the nature of the genetic code became fairly well 
established. As for the coding sequences of DNA into nucleotide units, one needs to build 
up more general, sophisticated, rationally functionalized systematics concerning 
DNA base sequences that will enable genes to be understood at the molecular biol- 
ogy level in more optimized form. Indeed, many approaches have been undertaken 
to describe gene characteristics from various viewpoints within the participating dis- 
ciplines [24-42]. In particular, the concept of symmetry' for DNA sequences plays 
an important role in understanding their characteristics. 

However, each has its advantages and disadvantages in terms of utility and convenience 
in applications. To our knowledge, so far, if we intend to incorporate a sequence of bases 
into another sequence and/or exclude certain bases from that substitution, we need 
to look further afield because normally, sequencing and inserting-deleting operations 
cannot help in distinguishing one from the other. That means that multiple types 
of operations are necessary if features of DNA containing exceptional sequences 
are to be treated. 

Previously, we suggested prototypal models that describe some clinical states based 
on group postulates [43]. In this article, we demonstrate a group/category theory-like 
model for molecular/genetic biology as an alternative application of our previous 
model. Specifically, focusing on DNA base sequences, we present a simple model 
where not only changes in sequences of DNA bases but also insertion, deletion, 
and recombination (partial/total conversion) of DNA bases are treatable within some 
simple rules via the combination of a set and a group defined over some specific wallpaper 
pattern. Moreover, a category theory-like formalism, where a description of the DNA 
bases and their transcription to RNA bases can be made, is attempted from which a 
category theory-like framework is constructed requiring as few and as simple rules as 
possible. As an example, by assimilating the canonical "central dogma" [26], we hope to 
provoke more interactivity among those interested branches of natural science, if possible. 
The methodology consists of eight parts, the content of which is built-up step-by-step as 
scope is enlarged to encompass the more advanced themes. 
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§1 A preliminary setting describing a wallpaper pattern used as a symmetry 
model for DNA sequences 

First, we consider a certain wallpaper pattern that helps us to visualize the operations 
of the present model (see Figure 1) [2,44-47]. There, the pattern comprises repetitions 
of a cruciform motif with each motif consisting of five letters E, C, A, T, and G with 
the latter four letters equally spaced at the points of a cross about a central E. The motif 
generates the pattern through a translation specified as a knights move in chess — two 
steps out and one step right. In this way, the grid-points in this regular wallpaper pattern 
can be obtained uniquely and be extended indefinitely. Note also that each horizontal line 
is generated by repetitions of the sequence E-C-A-T-G. Moreover, the line above is a dis- 
placed copy of the one below with letter A placed directly above letter E. This preserves 
the condition that any cruciform is composed of one each of the five letters. 

The wallpaper pattern as an array of cruciforms is capable of being constructed as 
stacks of a unit cell (the 5x5 square enclosed in the dotted line in Figure 1) by 
horizontal and vertical translations [2,44-47]. The positions of the bases of the 
cruciform motif are so determined to make it easier to determine the complementary 
base of each base; the practical applications are clarified later. We introduce the letter 
'E' to indicate an 'empty' base which is treated in the same way as the other bases at 
least for display purposes. This five-base scheme is adopted to aid the notion of group 
composition in our model. 

In addition, we focus on a point 'P' on the wallpaper pattern (i.e., the grid-point array 
in Figure 1), to compose a certain DNA base sequence. In accordance with this, we 
shall always adjoin a series of letters that are determined as a trajectory of the point 
'P'— also called the orbit of P — over the wallpaper pattern. For instance, when we 
identify or recognize some changes of DNA bases with 'P' moving from A — > C — > E' 
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Figure 1 Wallpaper pattern using the five bases. A point 'P' is assumed to move step-by-step over the 
wallpaper-like grid-point array where four DNA bases 'C, A, T, G' and imaginary 'E' forming the cruciform 
motif is used to generate the pattern. The unit cell enclosed by the blue dashed line can also be used to 
establish the pattern. 
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over the wallpaper pattern, then this represents a series of changes to one base located 
at a specific position of a DNA sequence in the manner 'ACE../ or '...A../ — > 
C...' — > \..E...\ The orbit of P' can describe series of sequences of DNA bases, or series 
of changes of each letters in the same places, although, in this article, we focus mainly 
on the latter case, without provisory context. 

With these postulates, we consider the set C 5 = {C, A, T, G, E}. If the point 'P' moves 
onto an 'E', 'E' must be included and identified in the series of letters, as in ACGET, for 
example. This is interpreted as the series of DNA bases ACGT. Thus, 'E' depends on 
context; that is, 'E' can be inserted or removed from any series where we would like to 
include or eliminate 'Es so long as these are recognized/tracked in the entire process. 
When read from left to right, the place number of each letter in the series is subscripted, 
as in 'AiC 2 G 3 T 4 \ After insertions/deletions, the place number is augmented/diminished 
depending on initial and final positions; hence following three insertions A 1 C2G 3 T4 — > ' 
A 1 E2C3G4E5E 6 T7 , ; this means the point 'P' takes the place 'E' once between A 2 and C 3 , 
and twice between G 4 and T 7 over the wallpaper pattern in Figure 1. More details are to 
be given later. 

As a further refinement, the orbit of 'P' can be stated as a sequence of shift 
operations as follows; let V denote a move one step to the right corresponding to say 
A —> T, T — > G or G — > E. Similarly, we denote T: move one step to the left as for C — > E, 
and E — > G; V: move one step up; and 'd': move down. We include 'n to designate a no 
move' (remain at the same point). A sequence of % 'd; % and 'n then provides a 
position-independent means to describe the orbit of 'P'; any of these five operations can 
be applied to any of the five letters. We denote their operations on 'P' in the following 
way. If point 'P' moves from 'E' to 'C (step to the right), we write 'E°r = C where v signi- 
fies apply V to 'E' (see Figure 1). In a similar way, 'E°l = G\ 'E°u = K 'E°d = T and 'E°n = E\ 
Note though that each operator means a change of one base to another base within these 
five bases; the meaning of '= is not the degree of translation but equivalence to the result- 
ant base from the wallpaper pattern. 

To shorten multiple applications of the operations, we introduce V to denote the 
composition of two operations, for example, '((E o r)°u) = E°r#u. From Figure 1, we find 
'E°d = T yields the same change as < E°r«u = T\ As other examples, 'vvd = ri results in 
r«r = u, and 'd^d^l = n results in 'd«d = v, because from Figure 1, 'E°r#r = E°u = K, and 
< E°d«d = E°r = C. All possible one-step changes between letters 'C, A, T, G, E' and oper- 
ators r, u, d, 1, n, and all possible compositions of operators for the wallpaper pattern 
of Figure 1 are presented in Figures 2 and 3, and Appendix A. 

The binary compositions among the five operations 'r, u, d, 1 and n' can be shown to 
satisfy the Abelian group postulates (wallpaper group/plane symmetry group/plane 
crystallographic group [2-4,44-47]). Indeed, let Z 5 = {r, u, d, 1, n}, then {Z 5 , •} is the 
Abelian group of order five. That is, for all elements e Z 5 , we have: 

1) Associativity: x«(y«z) = (x#y)#z, (x, y or z being arbitrary elements belonging to Z 5 ); 

2) Identity: n' is an identity element such that x«n = n«x = x; 

3) Inverse: a unique element x" 1 exists such that x« x" 1 = x _1 «x = n (x _1 is called the 
inverse element of x); 

4) Commutativity: x«y = y«x, 

5) Closure: any combination of operations between x«y belongs to Z 5 . 



Sawamura et al. Theoretical Biology and Medical Modelling 2014, 1 1:18 
http://www.tbiomed.eom/content/1 1/1/18 



Page 5 of 32 



o 


r 

(<-KGl) 


u 

(<->co 2 ) 


d 

(<->G) 3 ) 


l 

(<->co 4 ) 


n 

(<-+<Q s =COo) 


E 


c 


A 


T 


G 


E 


C 


A 


T 


G 


E 


C 


A 


T 


G 


E 


C 


A 


T 


G 


E 


C 


A 


T 


G 


E 


C 


A 


T 


G 



'<->' : bijection 

Figure 2 Cayley tables for the five bases and five operations of linear/rotational groups. Any of the 

five operations on any of five bases yields a base in a cyclic order. 
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Figure 3 Cayley tables for the linear group, rotational group and wallpaper group for five bases. 

This confirms the bijection between the wallpaper group and rotational group. 
\ J 
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Therefore, Z 5 is an Abelian group [2-4,44-47]. The inverses for each of the elements 
are: 

r 1 = 1, r 1 = r, iT 1 = d, cT 1 = u, n" 1 = n, (1) 

which can be used to complete the composition table— also known as the Cayley table 
of the group. 

We further stipulate that when we perform these operations, then we always 
assume/identify the coding of the sequence of DNA bases in accordance with these 
operations, and vice versa. This is because the action of 'u on E yields base % 
that of 'd' on 'E' yields base 'T, that of T on 'E' yield base 'G', and naturally that 
of 'n' on 'E' results in the same 'E\ For a more complex example, we might insert 
a certain series of 'As in ACCGT between the 3rd and 4th base. To begin, we 
decide to write this manipulation as follows: ACC( )GT is transformed into ACC 
(E)GT by inserting 'E\ Next, because the operation V to the new 4th component 
'E' yields 'E — > A (ACCGT' — > ACCAGT), and vice versa, that is, 'd' operating on the 4th 
component A produces A — > E' (ACCAGT' —> ACCGT'). In this way, appropriate use of 
'Es through the adequate combination of operators of Z 5 enables to express inclusion 
and/or exclusion of any base between bases in a DNA sequence. To indicate this, we 
adopt a vector-like description with an infinite number of 'Es being assumed to be 
present at the end of any given base sequence. This means the point 'P' takes 'Es 
an infinite number of times over the wallpaper pattern (Figure 1); i.e., 

Dj = [C|T|G|A|T|A|A|C|E|E|E|E|E|E|...] 

= [d|T 2 ( )G 3 |A 4 |T 5 ( )A 6 |A 7 |C 8 |E 9 |E 10 |E 11 |E 12 |E 13 |E 14 |...] 

= [C 1 |T 2 (E 3 )G4|A5|T 6 (E 7 |E 8 )A 9 |A 1 o|C 11 |E 12 |E 1 3|E 14 |E 15 |E 16 |...] 

(j: the number of the sequence, N: the number of single-stranded DNA bases of 'Dj's 
except for the infinite tail of 'Es; in the above case, N = 8) 

In the last expression (2b), is inserted before the 3rd component ^3' and 
6th component A 6 ' marked by '( )' in formula (2a), and the place numbers of all 
components to the right of the 3rd component are all incremented by T, those to 
the right of the 6th component; by '1 + 2\ Likewise, we assume that the deletion 
of any 'Es that are already displayed in Dj is always permissible according to need 
with the place numbers being decreased by the necessary size. 

Essentially, we regard the subscripted place number of a component of Dj, e.g., '3' of A 3 J 
as a convenient place mark to help in recognizing and counting the order of sequences. 
Place numbers remain fixed when performing operations within a series of operations dur- 
ing code recognition of bases. However, for an operation, another place number is always 
permissible in principle, from where indexing of a specific DNA base sequence starts. 

Alternatively, we use the following notation to describe various cases: 

1) we denote by '{Dj}' a sequence 'Dj' where specified 'E's other than the trailing series 
of 'Es are implicitly implied but the place number indexing is retained; i.e., 

{Dj} = [C 1 |T 2 |G 4 |A 5 |T 6 |A 9 |A 1 o|C 11 |E 12 |E 13 |E 14 |E 15 |E 16 |...]. 

(3a) 



(2a) 
(2b) 
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Here, the explicitly indicated place numbers are the same as in (2b) and missing 
subscripted place numbers indicate omitted 'E s. Hence, (3a) without trailing 'Es 
and subscripts represents an ordinal/conventional DNA sequence. 
2) we denote by '<Dj>' a sequence 'Dj' where specified explicit 'Es other than the 
trailing 'Es are deleted (changed into implicit 'Es) and the base sequence is 
re-indexed with sequential place numbers, i.e., 



Note that 'E s other than the trailing 'E s are not recognized as explicit components 
and hence are not indexed. Additional insertions/deletions of 'Es are permitted after 
deletions of 'Es; therefore, apart from the trailing 'Es, (3b) signifies an ordinal/conven- 
tional DNA sequence. 

Although equivalent to 'CTGATAAC as an actual DNA sequence expressions, 
related expressions {Dj} and <Dj> differ from each other; the former retains all 
information regarding inserted 'Es and place numbers whereas the latter does not. 

In an extension of the notation, a multiple sequence of deletions of 'Es (say t-times) 
can be written as a t- tuple of '< >'s denoted '<<<<Dj>>>> (t- tuple) = <Dj> t \ The final 
expression is without explicit 'Es other than those trailing at the end, and thus formulates 
a genuine DNA sequence after the appearance of indels. (Short for insertion/deletion 
markers, the idels are strings of mutated base pairs.) Similarly for the operation { }, we 
have '{{{Dj}}} (t-tuple) = {Dj} t \ The operations '{ }' and '< >' can be performed freely when 
necessary; if further indels occur at say 'G 3 ' and 'A 7 in 

<D j > = [C 1 |T 2 |G3|A 4 |T5|A 6 |A 7 |C 8 |E 9 |E 1 o|E 11 |E 12 |E 1 3|...], then < D j > changes into 
<Dj! > = [C 1 |T 2 (E3)A 4 |T5|A 6 (E 7 )C 8 |E9|E 1 o|E 11 |E 12 |E 1 3|...], and subsequently into 
«Dj!» = [C!|T 2 | A 3 |T 4 | A 5 |C 6 |E 7 |E 8 |E 9 |E 10 |E 11 1 ...]. The sequence < D^ > contains 
implicit 'Es aside from the trailing 'Es, and can be written as 

{<D jl >} = [C 1 |T 2 |A 4 |T 5 |A 6 |C 8 |E 9 |E 1 o|E 11 |E 12 |E 13 |...]. Naturally, {<D ;1 >} and < D n > 
are equivalent, but < <D^> > and < D^ > differ. Moreover, as long as place numbers are 
recognized/traced precisely, combinations of manipulations '{ }' and '< >' are allowed; 
e.g., {<{{<Dj!>}}>}. Hence, with appropriate use, we could treat (read, interpret, de- 
scribe, record) conventional sequences of DNA via '{Dj}' or '<Dj>\ However, below 
we shall focus on simple sequences 'Dj'. 

Looking at the beginning of a base sequence as in the following: 

Dj = [Ci|G 2 |A 3 |C4|...|Ti|...|A N _i|T N |E N+ i|E N+2 |E N+ 3|...], 

(i: i-th component of Dj, N: the number bases Dj) 
a directionality for any Dj can be imposed; 



< Dj >= [C 1 |T 2 |G 3 |A4|T5|A 6 |A 7 |C 8 |E 9 |E 1 o|E 11 |E 12 |E 1 3|...]. 



(3b) 



Dj(5^3) 



[Ci|G 2 |A 3 |C 4 |...|T i |...|A N _ 1 |T N |E N+1 |E N+2 |E N+ 3|...] 



and 



Dj(3^5) 



[Ti | A 2 1 . . . | T N+1 _i I ... I Cn-3 I An-2 I Gn-i I Cn | En+i I En+2 | En+3 !♦♦♦]• 
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The notation, '(5 — > 3)' and '(3 — > 5) J is simply an additional label representing the two 
possible types of endings of single-stranded DNA. Nonetheless, when the number of 
bases is finite, two sequences can be equivalent, as for example 

D j (5-.3) = [Ci|G 2 |A 3 |Q|T 5 |A 6 |T 7 ] 

and 

D j (3^5) = [T 1 |A 2 |T 3 |C 4 |A 5 |G 6 |C 7 ], 

unless the prime endings <5'(five prime) — > 3'(three prime) > or <3' — > 5' > accompanies 
the sequence designation. 

In accordance with these postulates, we now can define the set D = {Dj (j = 1,2,3,...) I 
Dj g C 5 x C 5 x C 5 x ... (N times, N < oo)} as the set of all possible sequences of recognized 
N-tuple single-stranded DNA bases. We can regard N to be a positive integer or infinity. 

An analogous definition is clearly possible for the set R of RNA sequences; with T 
substituted by 'UJ operations of group Z 5 , are similarly definable because all results obtained 
for DNA pertain to RNA under the base substitution. Thus, set R = {Rj (j = 1,2,3,... )| Rj e 
C 5 x C 5 x C 5 x ...(N times, N < oo)} is the set of all possible sequences of recognized N- tuple 
single-stranded RNA bases with C 5 = {C, A, U, G, E}. 

§2 Group composition that yields changes in DNA bases via a Cartesian vector 

Next, we can consider B = {B m (m = 1,2,3,...) | B m e Z 5 x Z 5 x Z 5 x ...(an N-fold product, 
N = oo)} = {Z 5 xN , •}, where elements of B act on any Dj. This means that Dj covers all 
possible sequences of the DNA bases, and this situation is the same for Rj of sequences 
of RNA bases. 

Because B is a Cartesian product of the same Abelian group, it is also Abelian, where 
composition of any two elements of B is denoted by V [4]. Details are shown in Appendix 
B and Figure 3. Accordingly, its formulation as a group B = {Z 5 xN , •} is confirmed. 

In a more general context, a Cartesian vector that is composed of the respective oper- 
ators 'b^k)' that effects the change Dj into D k is definable in the following way: 

B (M<) = [^0^)1 1 t>Q^k)2 | t)Q^k)3 1 • • • I b(j^k)i I • • • I ^Q^k) (N-l) | t>(j^k)N | n N+l | «n+2 | I*N+3 1 • • • ) 9 

(N : the number of components) . 

Hence, 

<D r B (M<) =D k \ (4) 

Clearly, for arbitrary J and % there exists a unique 'm' such that *B(j_>k) = B m (m = 1,2,3,...)'; 
despite the difference in notation, the two are identical in practice. 

Here, we present a simple example that consists of a multiple product of 'B(j_>k)'s. 
Consider the scenario that a certain sequence of a single strand (or one side of a 
double-strand) of DNA transitions from D 1 to D 3 , in stepwise fashion, 

Dj = [A 1 |C 2 |C 3 |G4|T 5 |E 6 |E 7 |...] = [A 1 |C 2 |C 3 ( )G 4 |T 5 |E 6 |E 7 |...], 
D 2 = [A 1 |C 2 |C 3 (E 4 |E 5 )G 6 |T 7 |E 8 |E 9 |...], 
D 3 = [A 1 |C 2 |C 3 (A 4 |T 5 )G 6 |T 7 |E 8 |E 9 |...], 
D 4 = [C 1 |T 2 |G 3 (T 4 |C 5 )G 6 |A 7 |E 8 |E 9 |...]. 
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We next consider the change T>! — > D 2 \ There exists an operator < B (1 ^ 2 ) = [ni|n 2 |n 3 
(r 4 1 u 5 )l 6 Idyllising...]' that is able to produce this change, specifically, the insertion of 
two 'Es between 'C 3 ' and 'G 4 ' yields the change — > D 2 \ However, this sort of 
manipulation can be troublesome. Hence, in our model, insertion/deletion of 'Es are 
instead ascribed to the way the vector Dj is interpreted. This is preferable as this avoids 
easier manipulations. Next, we construct the operator 'B^-^/ that maps 'D 2 — > D 3 ' (the 
details are shown in Appendix C). With reference to Figures 1, 2 and 3, we find 

B( 2 ^3) = [ni|n 2 |n3(u4|d5)n 6 |n7|n 8 |n 9 |...]. 
In a similar manner, 

B( 3 ^4) = [Ii|u 2 |d3(r4|d5)n 6 |l7|n 8 |n 9 |...]. 
Naturally, the final D 4 is obtained from D 1 recursively, 

'Di— >D 2 ', (5) 

< D2°B( 2 ^3) # B(3^4) — IV . (6) 

From the decomposition 

'B(j^k) = Bq^o) # B(o^i<) = Bj _1 #Bk', we obtainDj °B(j^ k ) = Dj°B(^ 0 ) # B(o^k) 

= D 0 #B (0 ^k) = IV, (7) 

where 

Do = [Ei | E2 1 E3 1 ♦ ♦ ♦ | Ei | ♦ ♦ ♦ | En-i I En I En+i I En+2 | En+3 I • ♦ ♦] 

(8) 

denotes the identity element of D. 

Note that the group operations can act on Dj irrespective of whether the 'Es are 
explicit or implicit as defined in §1. Moreover, any sequence 'Dj' can be presented as a 
polygonal line; as an example, the evolution of changes 'Di— >D 3 ' is displayed in 
Figure 4. 



§3 Integration of a linear group and a rotational group as a wallpaper group 

Looking at the definitions of groups Z 5 , D, and B, another approach is possible. The 
five bases can be represented by five equispaced phasors with a 1 2jiI5' angular phase 
separation located on the unit circle on the Gaussian plane, as depicted in Figure 5. 

Herein, in the Gaussian plane, if 'co' is defined to be the counterclockwise rotational 
angle 'co = 2tt/5 (rad)' and composition of 'co' is denoted V, then assuming 'co' obeys the 
'right translation rule', we have 



CO = COi, 

co • co = 2co = co 2 , 

co • co • co = 3co = 0)3, (9) 
co«co«co«co = 4to = C04, 

w • co • co • co • to = 5co = CO5 = coq = 0 (= no rotation). 



The general form of an arbitrary base is expressed as 'X m <-> Exp(m • co • i)' (here, T is 
the 'imaginary unit', co = 2tt/5 (rad), m = {0, 1, 2, 3, 4, 5}). With {*} meaning one of the 



Sawamura et al. Theoretical Biology and Medical Modelling 2014, 1 1:18 
http://www.tbiomed.eom/content/1 1/1/18 



Page 10 of 32 




D 3 = [A^C^A/I^G^EeE,,...] 

Figure 4 Graphical representations for changes of DNA sequences. Suppose next sequences; 

'0, = [At C 2 C 3 ( )G 4 T 5 E 6 E 7 E 8 . . .]', 'D 2 = [A 1 C 2 C3(E 4 E 5 )G 6 T 7 E 8 E 9 E 1 o. . .]' and 'D 3 = [A 1 C 2 C 3 (A 4 T 5 )G 6 T 7 E 8 E 9 E 10 . . .]', a 
series of changes, 'D } — ► D 2 — > D 3 ' are drawn as three polygonal lines where each bases are linked also in the 
definition of group Z 5 . There, we recognize not the locations of 'D's but mere alphabets, indexed number 
and shapes. 



bases among 'C, A, T, G and we construct the following map. Denoting composition 
by <0 ! co m acts on the identity trivially and hence yields the correspondences 

Exp(0-i)^{Exp{0-o)-i)} = {1} = {1} °gj 0 = E = X 0 , 



Exp(2m/5)<-+{Exp(l-co-i)} = 


{1}'0>1 = 


c 


= x u 


Exp (4m/ 5) <-> {Exp (2-co-i)} = 


{1}°0>2 = 


A 


= x 2 , 


Exp(6m/5)<r+{Exp(3-a)-i)} = 


{1}-C0 3 = 


T 


= x 3 , 


Exp(8m/5)<-+{Exp(4-a)-i)} = 


{l}°co 4 = 


G 


= X4 


X5 = Xq = E. 









(10) 



Expanding the operations for 'co^ co 2 , co 3 , ../ on bases 'C, A, T, G and E', we establish 
for instance: 



E°coi = C, C°coi = A, A°co 2 = G, T°co 3 = C, G°coi = E. 
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Imaginary axis 




+ Real axis 



co = 2 k 15 (rad) 



Gaussian plane 

Figure 5 A phasor diagram using the five bases over the Gaussian plane. The five nucleic acid bases 
label the points equispaced on the unit circle to form the fivefold phasor diagram over the Gaussian plane. 
With 'co' defining the counterclockwise rotation by '2n/5 (rad)' around the origin in the Gaussian plane, 
composition of angles under modulo 5 addition generates a representation of the cyclic group. The 
complex units X = {Exp(m • co • i)}, (i: imaginary unit, m: integer) following as a bijection angles to the plane; 
The bases are assigned to each phase: 'Exp(0 • co • i) = Exp(5 • co • i) = 1 <-> E', 'Exp(1 • co • i) <-» C, 'Exp(2 ■ co ■ i) A' # 
'Exp(3 • co • i) <-> T, 'Exp(4 • co • i) <-> G'. 



In continuance, the set P w = {co^ co 2 , co 3 , co 4 , co 0 (= co 5 )} is readily confirmed to form 
*roup {P w , •} where the identity element is 'oV and the inverse of 'ov is 



co 0 ~ 

wr 

co 2 ~ 
co 3 " 
C04" 
C05- 



wo, 

C0 4 , 

C0 2 , 
COi, 

to 0 = 



(11) 



0. 



Closure and associativity follow from (9) and (10). 

Here, if we turn our attention to the wallpaper pattern, a further bijection obeying 
the postulates of the wallpaper group can be confirmed. Corresponding to Figures 3 
and 6 a bijection between the Cayley Tables for translational and rotational operations 
can be established: 



r <-> co = coi, 
u <-> 2co = CO2, 
d <-> 3co = 0)3, 
1 <-> 4co = co 4 , 
n <-+ 5co = 0)5 = coq 



(12) 



0. 



Naturally, inverses (e.g., CO2 1 = co 3 ) are preserved in accordance with the inverses for r, 
u, d, 1, and n\ Any right translation of the horizontal line in Figures 1 and 6 
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2 co3||co2 

C i 
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co 4 
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<o2 co 3 
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co4 

G 



co 3^ jco 2 co 3^ jco 2 A 3^ jco 2 co 3^ jco 2 
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• 2 co3 
oj4 
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Figure 6 Scheme for an accessorized wallpaper pattern synthesized from the linear group and 
rotational group for the five bases. A bijection exists between the primitive operations of both 
groups: 'r «->co = oo 1; u 2oo = oo 2 , d 3oo = oo 3 , I 4oo = oo 4 , n oo 0 = oo 5 = 0'. Transitions in the four 
directions of the cruciform are expressible as rotations of the fivefold phasor diagram in the Gaussian 
plane. In other words, the linear operation and the rotational operation of the five bases are 
synthesized into a unique scheme (wallpaper pattern). 



(translational group) is also expressible as a rotation over the fivefold phasor diagram 
in Figure 5 (rotational group). Thus, these are able to be regarded as a synthesized form 
of the wallpaper style (wallpaper group) from which expressions such as A°r = A°(x>i = 
T = E°d' and A°l = A°co 4 = C = E°r' can be confirmed. All possible one-step changes be- 
tween A, C, T, G and E' and 'co^ co 2 , co 3 , co 4 and coq,' are shown in Figure 2. 

Therefore, this rule for 'E' does not break the postulates for set D, group Z 5 , and 
group B. 

§4 Methods to obtain complementary sequences from primary DNA 

Suppose, from among 'C, A, T, G and E', a base 'X m ' is given; its complementary base 
'Xm 1 "' to 'X m ' is defined as follows; for 'X m = {Exp(m ♦ co ♦ i)}, m = {0, 1, 2, 3, 4, 5}, then 
'X m + ' is obtained by 'X m + = {Exp((5 - m) • co • i)}', where '{Exp(5 ♦ co • i)} = {1} = E\ In this 
regard, 

'X 5 + = X 0 = X 0 f = Xg' (13) 

The procedure yields specifically A + = T and 'C + = G\ 
Clearly, the complement of 'E' is 'E' itself; 'E f = E\ 
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Another notation for the ( X m ' expressed as a base can be given. We introduce the 
one-value function (CO X(mY that provides the same results, 

< X m = C0 X(m) = E°mco = E°(co«co«...#co (m times))^ = 0, 1, 2, 5). (14) 

As for 'm' in (14), both positive and negative integers are permissible. Thus, 
'Xm' is expressible as 

X m t = C0 X(5-m) = E-(5-m)o = E° ((*)•(*)•... #0) ( < 5-m , times)) , (m = 0, 1, 2, 3, 4, 5). 

(15) 

A simple example is illustrated below. 

Suppose 'D/ = [A 1 |T 2 |C 3 |E 4 |G 5 |T 6 |...] = [«*X(2) |"X(3) |"X(1) |"X(0) |»X(4) |"X(3)| ..-], then, 

'Dj 1 "' = [ W X(5-2)| W X(5-3)| W X(5-1)| W X(5-0)| M X(5-4)| M X(5-3)|...], 

= [ W X(3)| M X(2)| M X(4)| W X(0)| W X(1)| M X(2)|...], (16) 
= [T 1 |A 2 |G 3 |E 4 |C 5 |A 6 |...]. 

In accordance with the wallpaper group in Figure 1, the translations in one direction 
(e.g., right) over a horizontal line form a cyclic group P r that contains only {r, r 2 , r 3 , r 4 , 
r e (= r° = r 5 = n)}. This group is isomorphic with group P w = {(x> 1} co 2 , (= w s)K 

as is the group similarly generated over a vertical line. 

Similar to <C0 X(m); ( X m ' can be expressed using another one-value function r X(m) = E°r m ': 



X m = r X(m) = E°r m = E°(r#r#...«r (m times)) (m = 0, 1, 2, 3, 4, 5). (17) 

Hence, 'X^' (the complementary base of ( X m ') is written as 
X m + = r X(5-m) = E°r 5 ~ m = E-(r#r«...«r ('5-m'times)). (18) 

Extension to vertical translations is straightforward; 
< X m = u X(5) = E°u 5 = E°(u#u#...#u (m times)), (19) 



and its complementary base 'X^' can be identified similarly although the order of 
letters are somewhat different. 
Consider the following simple example in identifying 'D^ using <r X(m) s; 

for 'D,' = [A 1 |T 2 |C 3 |E 4 |G 5 |T 6 |...] = [ r X(2)| r X(3)| r X(l)| r X(0)| r X(4)| r X(3)|...], 

by replacing <w X(m)' by <r X(m)' in formula (15), the same result is obtained. 

According to these rules, 'X^bpa^ x4] = X^r 3 = E°r 4 = G\ In general, when the i-th 
component 'b [Xml ^ xm2]i of 'B (ml ^ m2) ' changes X ml (= E°r ml = E°(m 1 co)) to X m2 
(= E°r m2 = E°(m 2 co)). 
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Hence, the highlighted form of the operator vector is expressed as 
B (m i^m2) = [...|bpcmi-xm2]i|...] = [••• \^ 2 ~ ml i | ...] = [... I (m 2 -mi)a> i| ...] . (20) 

For a further example, given the operator 'B^ k ) that changes Dj to D k , 

Dj = [Ci|A 2 |E 3 |...|C i |...|T N _ 1 |A N |E N+1 |E N+ 2|E N+3 |...], 
= [E°r\|E°r 2 2 |E°A|...|Eor\|...|E°^ 

Dk = [Gi|T 2 |C 3 |...|Gi|...|C N _i|T N |E N+ i|E N+2 |E N+3 |.. < ], 

= [E.i^|E.i* 2 |E.r^|...|E.^ 

With details shown in Appendix D, 'Bq^' takes the form 

B(j_,k) = [ r i| r 2 |r 3 |...|r 1 1 — | r- N _i|r N |r n+i|* N+2|r n+3|—J- 

Naturally, the state D k is obtained through recursively applying the operations, 
Dj 0 B(j_k) = D k . (Details are presented in Appendix D). 

Whereas 'D^'s might have components in reverse order in terms of sense (5' or 3'), 
there exists however certain 'D k such that 'D k = Dj + , (j, k = 0, 1, 2, 3, 4,...)'. With this, 
'Dj t ' is one of the ordinal elements belonging to the same set D. Thus, the symbol 'Y 
need only be present when elements are distinct. 

§5 Further unifying notation to describe the wallpaper group operation 

Consider Figure 6; we assume that the number of right translations V e group P p (or co 
g group P r ) is V and the number of up translations V £ group P (or 2co (= co 2 ) £ group 
P r ) is V with a, b = ...,-2, -1, 0, 1, 2,.... Similarly, with 'd <-> 3co' 1 <-> 4co', the total 
change can be summarized as x[a, b]\ We can confirm that there exists at least a pair 
of a, b' that satisfies 

< X = E°x[a,b] , 7 (21) 

because any base in Figure 6 can be obtained by a finite number of transitions from 'E\ 
For instance, A can be expressed as; A = E°x[0, 1] = E°x[2, 0] = E°x[l,3] = E°x[2,3] = E°x 
[4,4]'. However, we remark that x[a, b]' means changes of bases from one to another 
prescribed by the wallpaper pattern. In practice, x[a, b] = r a #u b ' constitutes a multiple 
composition of elements of group Z 5 . In addition, x[-a, -b] = r" a «u" b = r«d b ' or <-> 
< (-a)co#(-b)(2co) = (-a-2b)co\ E.g., x[-3, -2] means r~ 3 «u~ 2 = l 3 «d 2 ' or <-> '(-3)co#(-2 • 2) 
(co) = (-3 -4) co = (-7)co = (-2)co = 3co = co 3 \ 

For the wallpaper group, the 'a and V should be interpreted in modulo 5 addition. 
The Cayley table for the wallpaper group are presented in Appendix A. 

Within 'the square unit cell' in Figure 1 or 6, there are five pairs of a, b' for each 
base, as for A. Under modulo 5 addition, x[a + 5, b + 5] = x[a, b]' holds. Moreover, if 
'X + ' is obtained from 'X' using 'X = E°x[a, b]', 'X + ' can be determined as 

'X + = E-x" 1 [a, b] = E°x[-a, -b]', (22) 

or 'X + = E-x[5-a,5-b]', (23) 

because 'X' and 'X + ' are symmetrically disposed with respect to 'E' over the wallpaper 
pattern that would be selected as a standard for the definition of a, b\ In practice, for 
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an arbitrary 'X', 'X + ' can be obtained via (22) or (23) by making use of an arbitrary 'E' as 
the reference point for the symmetry. 

For example, if 'G = E°x[l,2]', then according to (22) 'G + = E°x _1 [l,2] = E°x[-2, -1] = C 
or according to (23) G + = E°x[l,2] = E°x[3,4] = C\ There are an infinite number of iden- 
tifications for the complementary base for an arbitrary base 'X'. 

Moreover, if we define the composition for the x[a, b] s as 

'x[ai,bi]#x[a 2 ,b 2 ] = x[a x , bi] + x[a x , a 2 ] = x[a : + a 2 , bi + b 2 ]', (24) 

we can confirm descriptions (20), (22) and (23). As for the operators ( B m ! with 'Dj' 
expressed as 'Dj = [...|E°x[ai , bj one of the candidates of the appropriate 'B^^'s 
that produces l DfB^ f ) = D^' is identified: 

B Mt) = [...|x[-2ai,-2bii|...] = [...|r- 2ai #u- 2bi i|...] = [...|l 2ai .d 2bi i|...] , (25a) 

or = [...|r- 2ai #(r 2 )" 2bi .|...] = [...|r" 2ai - 4bi i|...] = [...|l 2ai+4bi i|...] (using u = r 2 ','d = l 2 ') , 

(25b) 

or^[...|(-2a i )co#(-2b i )-(2a)) i |...] = [...|(-2a i -4b i )a) i |...] (using^^ma)', < u m <->2ma)'). 

(25c) 



The exponents '-2ai -4b/ in (25b) are permitted to take positive or negative integer 
values. 

In these expressions, the rules for the wallpaper group (25a) can also be expressed as 
either for the linear group or for the rotational group (25b or 25c). 
More generally, 

'Dj = [...|E°x[a (j)i , b 0)i ] is changed into T> k = [...|E°x[a (k)i , b (k)i ] and 
'B(j^ k) s that provides 'Dj°B(j^ k ) = Dj' is identified as 

B (M<) = [...|xa (k)i -a 0)i7 b (k)i -b (j)ii |...] = [...Ir^-Oi.ubdOi-bO)^)...) 

(= [. . . | r a ( k ) i+a 0)i # d~ b(k)i+b(j)i i | . . .] ) . 



(26a) 



Also, 



r a(k)i-a0)i # ^2^(k)i-b0)i I j = ^ | r a(k)i-a(j)i+2b(k)i-2b(j)i. | _j ? 



or else^>[...|(a (k)i -a (j)i )co#2(b (k)i -b (j)i )o)i|...] = [...| (a (k)i -a (j)i + 2b (k)i -2b (j)i )o)i|...] . 

(26c) 

As mentioned in §1, if a certain sequence 'X' has sense <5' — > 3'>, the complementary 
sequence 'X 1 "' of a certain sequence 'X' is reversed to <3' — > 5' > . 
To aid understanding, we present the following examples: Given 



Sawamura et al. Theoretical Biology and Medical Modelling 2014, 1 1:18 
http://www.tbiomed.eom/content/1 1/1/18 



Page 16 of 32 



Dj = [Ai|T 2 |E 3 |...|Q|...|G N _i|A N |E N+1 |E N+ 2||E N+ 3|...] 
= [E°x[0, lJilE-xp, -l] 2 ||E«x[0, 0] 3 |...|E°x[l, 0].|... 
...|E-x[-l, Oj^lE-xIO, l] N |E.x[0, 0] N+1 |Eo X [0, 0] N+2 |Ex[0, 0] n+3 |...]. 

then, according to (22), 'Dj' is simply 

Dj + = [E"X- 1 [0,l] 1 |Eox- 1 [0,-l] 2 |Eox- 1 [0,0] 3 |... 

...|E.x- 1 [l,0] 1 |...|E.x- 1 [-l,0] N _ 1 |E.x- 1 [0, IInIE^^OI^IEox-^O^I^IEox^^O]^!...], 
= [E-xO,-l]i|E-x[0, l] 2 |E-x[0,0] 3 |... 

...|E-x[-l,oy...|E-x[l,0] N _i|E-x[0,-l] N |E-x[0,0] N+1 |E-x[0,0^ 
= [T 1 |A 2 |E 3 |...|G i |...|C N _ 1 |T N |E N+1 |E N+2 |E N+3 |...]. 

If we use the optional formula (25a - c), the relation 'Dj°B(j_»j + ) = is derived. Details 
are given in Appendix E. 

Apart from these examples, additional identities for the wallpaper group can be veri- 
fied using Figure 1 or 6; e.g., 

'x[l,0]#x[0,l] =x[l,0]+x[0, 1] (= r#u) =x[l+ 0,0 + 1] = x[l, 1] =x[0,-l](= d)', 
'x[2,0](= r#r) = x[0, 1] = u7x[3, l]#x[-l, 10] =x[3-l,l + 10] = x[2, 11] = x[2, 1] 

= r 2 «u : = (r»r)»u = u«u = F. 

We develop various general formulas: 

x[a + 5, b] = x[a, b]', x[a, b + 5] = x[a, b]', 

x[2a, -a]=x[0, 0]', x[a, 2a]=x[0,0]', (27) 
x[-2a, a] = x[0, 0]', x[-a, -2a] = x[0, 0]'. 

Other unknown rules might underlie the wallpaper pattern. 

Concerning style in treating the wallpaper group, examples 'X m = r X(m) in (16, 17, 19), 
and w X(m) in (14, 15, 20) could be regarded as a specific combination that are displayed as 

< X m = r X(m) = 0) X(m) = E°x[a, 0] (a = -2, -1, 0, 1, 2, 5, 6, integer)'. (28) 



§6 Treatment of changes of sequences and the insertion/deletion of DNA 
bases via an optionally generalized operation 

Below, we demonstrate, using several examples containing 'Es, changes and inclusion/ 
exclusion of DNA bases using a more generalized scheme. 

For definiteness, let 'T>- be the sequence 'CGTAT...C...TA; we consider the change of 
its '1-3' components 'CiGaTs into G1T2A3; and moreover the insertion of two bases 
'GC between 'T 3 ' and A 4 ' denoted '( )': 

D j = [ C (j)l| G (j)2| T (j)3( ) Aq) 4 I • . . I C(j)i | • . • | T 0 ) N __x | Aq)!^ I E(j) NH _x | E(|) NH _ 2 1 E(|) NH _ 3 | — ] - 

We denote the result of this transformation as D k , 

D k = I T( k ) 2 1 A( k ) 3 (G(k)4 1 C(k)5) A(k)6 1 ♦ • ♦ | C(k)i +2 1 • ♦ • I T(k)N+l I A(k) N+ 2 1 E(k)NH-3 1 E(k)N+4 1 

E(k)N+5 — I— ]■ 

(29) 

The procedure from Dj to D k is described recursively to find operator 'B(j_* h ). 
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First, two 'Es are inserted after the 3rd component (this change is denoted 'Dj — > D h ') 
in preparation for insertion of 'GC; 

Dj— >D h = [C(h)i|G( h ) 2 |T( h )3(E( h )4|E( h )5)A( h )4 + 2|...|C( h ) i+ 2|... 

Ill I I I v / 

♦ • • | T(h)N-l+2 | A(h)N+2 | E(h)N+l+2 | E(h)N+2+2 | E(h)N+3+2 | • ♦ •] • 



Thus, 'B 0 _> h) 

= [b[C^C]l|b[G^G]2|b[ T ^T]3(b[G^E]4^ 

This change is in accordance with those rules for vector-like 'Dj's dependent upon 
'E's. 

Hence, the operator B(h^ k ) that produces the change from D h to D k is described 
as: 

B(h->k) = [b[C^G]l|b[G^T]2|b[T^A]3(b[E^G]4|b[E^C]5)b[A^A]6|«- 

I II I I I I I V ) 

. . . I b [ C ->C] i+2 1 — | b [T->T] N+ 1 1 b [A-> A] N+2 | b [£->E] N+3 | b [E->E] N+4 1 b [£->E] N+5 | • • • ] • 

Thereby, 

Dh°B(h^i<) 

= [C i °b [ C _ G ] 1 1 G 2 °b [ G _t] 2 1 T 3 °b [t— >a] 3 ( E 4 °b [e->g] 4 1 E5 °b [ E _>q 5 ) A 6 °b [a— >a] 6 1 — | Q °b [ C _> C ] i+2 1 . . . 
I Tn+i °b[ T ^ T ] N+1 1 A N+2 °b[ A ^A]N+2 1 En+3 °b[ E ^ E ]N+3 1 En+4 °b[ E ^ E ]N+4 1 En+5 °b[ E ^ E ]N+5 !♦♦•]• 

(32) 

With reference to Figure 1, 2, 6 or Appendix A, 

= [C r di|G 2 °1 2 1 T 3 -1 3 (E 4 °1 4 1 E 5 °r 5 ) A 6 °n 6 1 . . . | C i+2 °n i+2 1 . . . 

• • . I Tn+ 1 °nN+ 1 1 A N+2 °n N+2 1 E N+3 °n N+3 1 E N+4 °n N+4 1 E N+5 °n N+5 1 , . ,] , 



(33) 



= [G 1 |T 2 |A 3 (G 4 |C 5 )A 6 |...|C i+2 |...|T N+1 |A N+2 |E N+3 |E N+4 |E N+5 |...], (34) 
= D k .(29). 

This indicates a code change of the 1-3' components and a 'GC insertion after the 
3rd as described via the two steps: 1) Dj — > D h (inserting two 'E s after the '3rd' compo- 
nent), and 2) D h °B( h _> k ) = D k . Note that the exclusion of the '4-5' components 'GC 
from D k and the transformation of the '1-3' components from 'GTA to 'CGT consti- 
tute the recursive procedure for the inverse operator 

'B(k^h) = B^k)" 1 ' (35) 

Alternatively, 'D h — > Dj' is obtained by deleting the two 'E s from the '4-5* components of 
D h to yield the initial state 'Dj' in accordance with the characteristics of the vector-like 'Dj s. 

In summary, essentially, all transitions (changes and inclusion/exclusion) of a certain 
sequence within the same single-stranded DNA, whether it has finite or infinite length, 
can be described in principle within a single operation using only the unique operator 
B( ) g group B. 
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§7 Synthesis of changes, insertion/deletion, and recombination of DNA bases 

As a further development, to demonstrate recombination, take two finite sequences 
'GETAGT (= D cl )' and ATAGCTA (= D dl )\ These have vector expressions 

D cl = [G 1 E 2 T 3 A 4 G 5 T 6 |E 7 E 8 E 9 ...], (36) 

= [G(cl)l | E(cl)2 I T (cl)3 | A (cl)4 | G(cl)5 | T (cl)6 | E (cl)7 | E (cl)8 | E (cl)9 1 ♦ ♦♦] j 

D d i = [A1T2A3G4QT6A7 |E 8 E9Ei 0 ...], (37) 

= [ A (dl)l | T (dl)2 I A( d l)3 |G(di)4 |C(dl)5 | T (dl)6 | A (dl)7 | E (dl)8 | E (dl)9 | E (dl)10 | ♦♦♦] • 

To illustrate for the pair D cl and D dl , we consider recombination to take place be- 
tween the sequence 'T3A4G5T6* of the (3-6)-th component of 'D cl ' and the AGCTA of 
the (3-7)-th component of 'D dl ' at the same instant 

First, in the pair of sequences, a series of 'Es of complementary size is inserted in 
'D cl ' just before the sequence to be converted, and in 'D dl ' just after the sequence to be 
converted. For example, for ( D cl ! five 'Es, 'EEEEE', of size equivalent to that of 
A3G4C5T6A7' of 'Ddi! are inserted just before 'T 3 ' in 'D cl ' where A3G4C5T6A7* is to be 
located, that is, the interval between 'the 2nd 'E 2 and 3rd 'T 3 ' within 'D cl \ Under this 
procedure, D cl changes into D c2 : 

D c2 = [G 1 E 2 (EEEEE ) T 3+5 A 4+5 G 5+5 T 6+5 |E 7+5 E 8+5 E 9+5 ...], (38) 
= [GiE 2 (EEEEE) T 8 A 9 GiqTii |Ei 2 Ei 3 Ei 4 ...]. 

Here, we assume that 'EEEEE' is changed into A3G4C5T6A7' (originally, the (3-7)-th 
component of 'D d i). In addition, ' TgAQGioTn ' is transformed into the same number of 
'E s, 'EEEE', at the same time. By this process, 'D C 2 changes in 'D^': 

D c3 = [g 1 E 2 (A3G 4 C 5 T 6 A 7 )E 8 E 9 E 10E11 IE12E13E14...]. (39) 

Note that bold type and underline are here merely pedagogical aids to help identify 
sequence changes. Meanwhile, four 'Es 'EEEE' equivalent in size to 'TVA^GsTV of ( D cl ' 
would be inserted after A 7 ' of 'D dl ' where 'TsA^G^T^ of 'D cl ' is to be located within 
'Ddi'. That is, 'TsA^G^T^ is inserted into the interval between the 7th A 7 and 8th 'E 8 ' 
within 'Ddi'. In this procedure, D d i changes into D d2 : 

Dd2 = jAiT 2 A 3 G4C5T 6 A 7 (EEEE)E 8+4 E 9+ 4Ei 0+ 4...j , 
= [AiT 2 A 3 G 4 C 5 T 6 A 7 (EEEE)Ei 2 Ei 3 Ei 4 ...] . 

(40) 

Furthermore, we change 'EEEE' into the equivalent-sized < T 8 A 9 Gi 0 T 11 ' (originally, 
the (3-6)-th components of ( D cl ') while A 3 G 4 C5T 6 A 7 is transformed into the 
equivalent-sized 'EEEEE'. Through this procedure, < D d2 ' changes in 'D d3 ': 

D d3 = [a 1 T 2 E 3 E4E 5 E 6 E 7 (T 8 A 9 GioT 11 )E 12 E 1 3E 1 4...] (41) 

As a result, if we omit the infinite series of 'Es from right end, we have the re- 
combination (partial conversion between this pair of sequences from T> cl , D dl ') 
with ( D cl ' = < G 1 E 2 T 3 A 4 G5T 6 ' being transformed into 'D c3 ' = < G 1 E 2 A 3 G 4 C5T 6 A 7 ' and 
'D dl ' = A 1 T 2 A 3 G 4 C5T 6 A 7 ' being transformed into ^,^3' = A 1 T 2 T 3 A 4 G5T 6 '. We 
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define the manipulation of the recombination (partial/total conversion) between 
'D c i, D dl ' in this way. 

In the initial stage in the previous illustration, we inserted different sizes of 'E' se- 
quences in each line; however, processes 'D cl — > D c2 ' and 'D dl —> D d2 ' are preferred to 
be regarded as 'E' insertions/deletions (see comments prior to equation (5)) and this 
rule depends upon the characteristics of these vectors (e.g., 'Dj's). 

As previously explained, the operations can be performed in any of the three equivalent 
linear group, rotational group, and wallpaper group. Choosing the wallpaper group, 

= [G i -m | E 2 °n 2 (E 3 °u 3 1 E 4 °1 4 1 E 5 °r 5 1 E 6 °d 6 1 E 7 °u 7 ) T 8 °u 8 | A 9 °d 9 1 G i 0 °r i 0 | TirUn | E 12 n 12 1 ; 

|Ei3°ni 3 |Ei4°ni 4 |...], 
= [G 1 |E 2 (A 3 |G4|C5|T 6 |A 7 )E8|E 9 |Eio|En|E 12 |E 13 |E 14 |...], 

= D c3 , 

where B (c2 ^ c3) = [ni |n 2 (u 3 |U |r 5 |d 6 |u 7 )u8 |d9 |rio |uii |ni 2 |ni 3 |ni4 1 - 



(43) 



(44) 



Also, 

Dd2 °B( d2 ^d3) 

= [Ai°ni|T 2 °n 2 | A 3 °d 3 | G4 °r 4 | C 5 °1 5 | T 6 °u 6 | A 7 °d 7 

(E 8 °d 8 |E9°u 9 |Eio°lio|Eii°dii)Ei 2 °ni 2 |Ei 3 °ni 3 |Ei 4 °n 4 |...], 
= [A 1 |T 2 |^ 3 |E 4 |E_ 5 |E_6|E 7 (T 8 |A 9 |G 10 |T 11 )E 12 |E 13 |E 14 |...], 

where B (d2 ^ d3) = [n x |n 2 |d 3 | r 4 11 5 |u 6 |d 7 (d 8 |u 9 |lio|d 1 i)ni 2 |ni3|ni4|..,]. v > 

With respect to (42) and (44), the inverse identities are confirmed: 

B(c2^c3) _1 = B( d2 ^ d3 ). (46) 

Generally, B (__>_) giving transition 'D c2 — > D c3 ' automatically produces an inverse 
change for 'D d2 —> D d3 ', as stated in (46) and reduces troublesome manipulations, 
even if only partially. 



§8 Further applications of the composition category-like prototypal model 
using additional ribonucleic acid (RNA) 

We next comment on other possible applications of the model. The category theory- 
like construction for treating DNA transcription to RNA might be conceivable, and the 
combination of the set and the group can comprise a category when these satisfy cat- 
egory theory postulates [48,49]. That is because we believe that in future developments 
the discussion should embrace category theory as one of the important options. 

To begin, according to our description for handling 'E s, it seems difficult to define inverse 
elements in a group theoretical way when there are deletions of 'E s from any place in a se- 
quence because we cannot find sufficient numbers of 'E s in the target component of Dy 

Thus, we consider the morphism f that transforms the sequence of DNA bases 
within set D as follows [48,49]. 

morphism f : X — > X, dom(f ) = Dj, cod(f ) = D k . Object 'X' is the set of 'Dj s. There 
exists a morphism 'l x ' such that 'l x # f = f = f # lx for every 'morphism f, when 'l x ' = 
[n 1 |n 2 |n 3 |...|n i |...|n N _ 1 |n N |...] (e group B). If supplemented, the 'morphisms f 
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comprise 'group B' (see reference list in Figure 7). The group composition for 'f/ and 
%' is denoted 'fi^'. 

As mentioned earlier, sequences of DNA consisting of bases 'C, A, T, and G' are tran- 
scribed into RNA consisting of 'C, A, U, and G\ This process can be regarded as the 
combination of two manipulation; I) transcription from the original DNA sequences 
(Dj) to those of its complement (Dj + ), and II) alternation from 'C, A, T, and G' to 'C, A, 
U, and G' (Dj + -> R/) (both are illustrated in Figure 8). 

Step I 

The transformation from the original DNA bases 'Dj' into the complementary sequence 
'Dj"^ (e.g./TCATEAGCTGA../— > AGTAETCGACT../) (for transcription to pre-messenger 
RNA (pre-mRNA) before splicing) can be performed via the manipulation (13-15, 17, 18, 
20, 22, 23, 25a-c) in §3 and §4. 'D/' can be obtained via the linear group (17, 18, 20), the 
rotational group (14, 15, 20) and also the wallpaper group (21-23, 25a-c, 26a-c). Thereby, 
morphism p : X — > Y, dom(p) = Dj, cod(p) = D^. Object Y is the set of 'Dj's (essentially 
equivalent to the set of 'Dj s). 
There exist morphisms 'lx and 'ly such that 'ly # P = p = p # lx f° r morphism p, where 

l x = ly = [ni|n2|n3|...|ni|...|n N _i|n N |...]\ (47) 
However, in practice, morphism p is one of the 'B m s e group B (see Figures 7 and 8). 



Morphism f : X^X. (f: Dj -> D k ). Source object: X (= set D), target object: X (= set 
D), dom(f) = Dj, cod(f) = D^. Practically, morphism f is one of the 'B m 's (e group B). 
Morphism p : X— >Y. (p: Dj — > D/). Source object: X (= set D), target object: Y (=set 
D), dom(p) = Dj, cod(p) = D/. Practically, morphism p is one of 'B m 's (e group B). 
Morphism x : Y^Z. (x: D/ — ► R/). Source object: Y (= set D), target object: Z 
(= set R), dom(x) = Dj 1 , cod(x) = Rj 1 . 

Morphism g = p«x: X^Z. (g: Dj — > Rj f ). Source object: X (= set D), target object: Z 

(= set R), dom(g) = Dj, cod(g) = Rj 1 . 

Morphism h: Z^Z. (h: 'R/ -> Rj 1 ' or 'Rj -> R,'). Source object: Z (= set R), 
target object: Z (= set R), dom(h) = Rj 1 , cod(h) = R/. (Rj 1 , Rj g set R) 
Like morphism f, morphism h is one of the 'B m 's (e group B). 
Morphism j: Z^Zs. (j: 'R, 1 — > Rsj 1 ' or 'R, Rsj'). Source object: Z (= set R), 

target object: Zs (= set Rs), dom(j) = R/, cod^) = Rs/. (Rsj 1 , Rsj e set Rs) 

Figure 7 Definition of category C. A simplistic definition of category C treating the traditional "central 
dogma" is presented. In practice, any selection of operations (morphisms) is permissible because composition 
within the category C using any of the three operations belonging to the linear group', 'rotational group' or 
'wallpaper group' is considered possible. The morphisms f and h correspond to elements 'B m ' of group B, the 
only difference being the Ts and 'U's. Actually, morphism j is regarded as part of morphism h ('Rj + 's and 'set R' 
are also substitutable for 'Rs/'s and 'set Rs'); all morphisms except for V an 'g' satisfy the group postulates, and 
are treated as operations of group B. 



Sawamura et al. Theoretical Biology and Medical Modelling 2014, 1 1:18 
http://www.tbiomed.eom/content/1 1/1/18 



Page 21 of 32 



g=p.T 



Object X (Set D) 



morphism f 



D- =[T|C|A|T|E|A|G|C|T|G|A|...] 



DNAs 



43- 



morphism p 



Object Y (Set D) 

Dt= [A|G|T|A|E|T|C|G|A|C|T|. 



(morphism f) 



complement 
ary DNAs 



morphism T 



Object Z (Set R) 

R.t= [A|G|U|A|E|U|C|G|A|C|U|. 



morphism h 



The procedure for 
transcription from 
a primary single- 
strand DNA to a 
pre-mRNA 



morphism j 



Object Zs (Set Rs) 

Rs-t = [A|E|E|E|E|U|C|G|A|C|U| . . .] 



pre-mRNAs J 

e.g., intron; 2-4th 
bases 'GUA' 

mRNAs 



XL 



The procedure 
for splicing from 
a pre-mRNA to 
amRNA 



Towards translation of RNA for synthesis of protein with (e.g.) <Rs/> = [AUCGACU. . .] 

Figure 8 Example of a category theory-like scheme on canonical "central dogma" and reference 
chart. The features of DNA sequencing involving changes, insertion/deletion, and recombination, 
can receive a group theoretic treatment. A single-stranded 'D/ ("sense" of double-stranded DNA) is 
transformed into its complementary sequence 'Dj 1 ' (same as its 'anti-sense') or remains in the same 
'Dj' by morphism p. A single-stranded 'D/' (same as the "anti-sense" of 'D/) without discrimination 
for directions (<5' — > 3' > or <3' — ► 5'>) is transformed into its pre-messenger RNA (pre-mRNA; 'R^') 
containing 'introns' that are not used in protein synthesis, or remains in the same 'D^' by morphism t. Then, 
'Dj 1 "' is transcribed into mature RNA (mRNA; 'Rsj 1 "') through RNA splicing. After either the simultaneous 
deletion of all explicit 'E's from 'Rs/'s, <Rs j t > i (= < Rs j t >), or through a sequence of deletions, <Rs/ > t 
(t = 0, 1, 2,...) with additional idels, protein synthesis can be described in subsequent procedures of this 
scheme. Objects X and Y are in set D, and object Z is in set R; however, object Zs is in set Rs, which is a 
part of set Z. Over set Rs, group operations are not definable at present. 



Step II 

Next, we define manipulations that change the above ( D^' into 'R^' where all 'Ts are 
converted into 'Us; e.g., (D^=) [A|G|T|A|E|T|C|G|A|C|T|...] (R j t =) [A|G|U|A|E|U| 
C|G|A|C|U|...]\ 

This process can also be expressed in a similar way as transcription, 
morphism x : Y— > Z, dom(x) = D^, cod(p) = R^. Object 'Z' is the set of 'R^'s. There 
exist morphisms 'ly and 'l z ' such that '1z # t = t = T«ly for every 'morphism x, where 



ni n 2 n 3 ... m 



n N _i n N .. 



(48) 



(refer to Figures 7 and 8). 

Evidently, morphism x does not satisfy the group postulates because the source object 'Y 
and target object 'Z' are different and a single set of operations cannot be defined at this stage. 

Additionally, as for Steps I and II, the resultant process for morphisms p and x can 
be expressed as: 

morphism g = p«x : X— >Z, (49) 
dom(g) = Dj, cod(g) = Rj f (see Figures 7 and 8). 

There exist morphisms lx' and 'lz such that 'lz*g = g = g # lx'- (50) 
The only difference between and Rj is the appearance T and 'LT in the sequences. 
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Naturally, for RNA base sequences, similar treatments are possible in the single group B: 
morphism h : Z — > Z, dom(h) = R^, cod(h) = R k ^ (Figures 7 and 8). 

There exists morphism 'lz such that 'lz*h = h = h«lz'. (51) 

Ordinarily, in prokaryotic cells, the DNA sequences are transcribed along their entire 
length. For eukaryotic cell, a splicing process is needed using nascent pre-messenger 
RNA (pre-mRNA) where introns of DNA bases are removed and exons are joined be- 
fore producing a correct protein through translation, resulting in the mature messenger 
RNA (mRNA). Thus, the previous procedure was about the prokaryotic cell or the pre- 
translation of pre-mRNA in the eukaryotic cell Therefore, to treat the products after 
this RNA splicing procedure in the eukaryotic cell, the following approach might be 
possible. The removal of introns can be regarded as changes from a certain series of 
bases to 'E 's as follows. 

If GUA' is removed from A(GUA)EUCGACU../ to become A( )EUCGACU../, this 
procedure can be described as; 'R^ — > Rs^I 

'R/ = [A 1 (G 2 |U3|A 4 )E 5 |U 6 |C 7 |G 8 |A 9 |C 1 o|U 11 |E 12 |E 1 3|E 14 |...]^ (52) 

V = [A 1 (E 2 |E 3 |E 4 )E5|U 6 |C 7 |G 8 |A 9 |C 1 o|U 11 |E 12 |E 13 |E 14 |...]'. (53) 

The 'Rsj + ' form a set Rs = {Rsj f (j = 1,2,3,...)} that is a part of set R (see Figures 7 and 8). 

Hereon, we admit 'E 's in the sequences of RNAs (as elements of set Rs) during the 
operations before morphism T and after morphism 'j' to maintain theoretical 
consistency. Thus, if the result of a series of these maps is 'Rs^ = A 1 E2E 3 E 4 E 5 U 6 C7_ 
G 8 A9C 10 U 11 E 1 2E 1 3E 14 ... , , then the actual RNA sequence should be interpreted as 
'AUC...\ Specifically, an equivalent-sized substitution of some bases in pre-mRNA with 
'Es can be written morphism j: Z — > Zs, dom(j) = R^, cod(j) = Rs^. There exists a 
morphism 'l Zs ' such that 'lz s # j = j = j # lz s - 

J changes some series of bases from 'C, A, U, G, E' to an equivalent-sized series of 
'E's within the partial operations of the group B. However, morphism J fails the group 
axioms, as inverse might not be definable. 

Finally, as in §1, we apply the simultaneous deletions of all explicit 'E's of mRNA other 
than the trailing 'E's, the state after these deletions being denoted with '< >'; for 

'Rsj 1 " = [AiE 2 E 3 E 4 E5U6C7G8A9CioUiiEi 2 Ei 3 Ei 4 ...] 
= [A 1 (E 2 E 3 E 4 )E5U 6 C7G8A9CloUllE 12 E 13 E 14 ...] , , 

the description '<Rs^ > = [A 1 U2C3G4A5C6U7...] , is specified without explicit non- 
trailing 'Es. In this regard, as in §1, if some indels (insertions/deletions) occur at certain 
bases of<Rs J t >, as for ^Rsj/ > = [A 1 U2(E 3 )G4(E 5 )C 6 U7...] , (with the deletion of 'C 3 ' 
and 'A 5 '), we state the result as '<<Rsj/>> = [^1126504115... ]\ 'R^s include < Rsj^ > s 
and '<<Rs^> > s from the set R and both still satisfy the postulates of group B. This rule 
is a relative postulate and explicit 'Es are not absolutely forbidden in '<RsjVs or 
'<<RSj t >>s, hence further indels of 'E's into '<Rs^> s or '<<Rsj t >>s are not forbidden. 

Also, omissions of explicit 'Es are considered as in < {<Rsj 1 t >} = [A2U2G4C6U7...], where 
place numbers '3' and '5' are absent indicating implicitly their presence in the vector. 
(Note that all products belong to group B.) Similarly, t- tuples of '< >s are denoted 
'<«<RSj + >»> (t- tuple) = <Rs^> t ' representing multiple deletions of 'E's (t-times). 
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Combinations of symbols '{ }' and '< > are also allowed when necessary, as for example 
{<{{<RSj!>}}>}, as long as the subscripted place numbers are adequately recognized/traced 

Nevertheless, the multiple use of '< > to remove all 'Es in the vector 'RsJ' should 
have a unique meaning with regard to protein synthesis. As a result, the subsequent 
reading/translation in line with codon-like '[AUC|GAC|U...]', '[A|UCG|ACU|...]' or 
'[AU|CGA|CU...]' leads in an ordinal way to a description of protein synthesis. 
Through the use of '{ }' and/or '< the concept 'E' may have benefits, although this 
may need to be intensely explored in future studies. 

The procedure reversing transcription, found for example in retrovirus, is also de- 
scribable if additional options are added to the scheme. However, these options are 
omitted at this stage to keep the model simple. 

In summary, suppose we have a category C with objects 'X', 'Y', 'Z', 'Zs' and morphisms 
T, ( p, V, gf, 'h' and J. We affirm that these definitions satisfy the postulates of category. 
A list is given in Figure 7. Indeed, morphisms other than V and gf are simple group- 
theoretical products. One of the reasons we have introduced the concept category is 
that the translation from single-strand DNAs to RNAs is difficult or impossible to 
systematize as a group structure. Therefore, if we identify the differences, we can treat 
all manipulations, except for V and gf, based simply on group B. 

The expression < hom(X, X)' denotes all morphisms f: 'from X to X'. Likewise, < hom(X, 
Y)' denotes all morphisms p: 'from X to Y\ In addition, 'hom(Y, Z)' denotes all morph- 
isms t: 'from Y to Z, and hom(X, Z) denotes all morphism g: from X to Z. Then, horn 
(Z, Z) denotes all morphisms h: 'from Z to Z\ Finally, hom(Y, Z) denotes all morphism 
h: from Y to Z. (Details are displayed in Appendix F) 

As is explained in §3 and §4, the rotational group can be regarded as a specific bijection 
of the wallpaper group [2,44-47], so, we can describe this relationship naturally in a cat- 
egory theory-like way where two categories Ci and C 2 are linked. 

First, we consider two categories Q and C 2 with a 'functor F from Q to C 2 written 
'F: Q — > C 2 \ For example, the pre-category C is denoted Q and the product of functor 
F on category Q is denoted C 2 [48,49]. Note that the only difference between Q and 
C 2 is assumed to be the nature of its expression; morphism f x = B x (e category Q) is 
based on the wallpaper group in Figure 1 or 6; e.g., 

<Bx = [r 1 |l 2 |u 3 |...|n i |...|r N _ 1 |d N |...] = [...|x[a i5 b^l] = [... |r ai #u bi i|...] (e group Bi)\ 

(54) 

Additionally, morphism f 2 = B 2 (e category C 2 ) is based on the rotational group over 
the Gaussian plane in Figure 5; e.g., 

'B 2 = [coi i|co 4 2|co2 3|..-|cooi|...|coi N _i|(o 3N |...] = [...| (a 4 + 2bi)co (e group B 2 )\ 

(55) 

With regard to the identity morphisms, we have 
'Ixi = Iyi = Izi = lzsi = [ni|n 2 |n 3 |...|ni|...|n N _i|n N |...], 1 X2 = 1 Y 2 = lz2 

= 1 ZS 2 = [C0 0 l|c0 0 2|cOo3|...|Woi|...|cOoN-l|cOoN|...]- (56) 

Herein, we view 'functor F: Q — > F(Q) (= C 2 )' in following way [48,49]. 
(Details are shown in Appendix G) 
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Note that a similar definition like the composition of Q based on the linear group 
and C 2 based on rotational group is possible, being linked with 'functor F\ 

This is satisfied provided an adequate definition of 'Functor F' is given, and we pre- 
sume that the morphisms described previously formulates a model that renders one of 
forms of the canonical "central dogma" proposed by Crick in 1958 [26]. 

In the transcription of RNA bases, the 'RNA splicing' process is well-known, whereby 
'intron sequences' are excised, and exon sequences' are combined to condense effectual 
information for further interpretation in protein synthesis. However, as for further pro- 
cessing of the triplets of bases e.g., ACG' and AUG! a considerable number of models have 
been reported e.g., [12,18-21,33,34,50]. We refrain from pursuing this issue at present. 

Results 

We added an imaginary base 'E' to the set of actual DNA bases, and composed group 
Z 5 of basic translational operations on grid-points of a cruciform wallpaper pattern 
constructed of the five base letters. Moreover, using the same five letters, we integrated 
the wallpaper group as the combination of linear group over the horizontal line and 
the rotational group based on symmetries of a fivefold phasor diagram on the unit cir- 
cle in the Gaussian plane. Additionally, changes in the sequences of the DNA bases are 
treated using set D, the set of all possible sequences of DNA bases that also contain *E\ 
Also, 'Dj's are drawn as polygonal lines graphically. Moreover, by combining group Z 5 , 
the operators that rearrange bases of DNA sequences constitute the group B. Using 
these results, simple changes of sequences, insertions/deletions, and recombination of 
DNA bases are also treatable via a synthesis of group-theoretical operations between 
sets D and group B. Together with this, all results obtained for DNA pertain to RNA 
by replacing T with U. Using these tools, category theory-like language is introduced to 
describe the canonical "central dogma" that is expected to integrate DNA-based pro- 
cesses, although the overall profile and range of applicability is unclear at this stage. Al- 
ternatively, by introducing the manipulations '{ }' and '< >, operations on states of 'Es 
in 'Dj's/'Rj s, whether explicit or not, can be performed in parallel with the conventional 
description for DNA/RNA sequences. 

Discussion 

The issue treated in this article is, roughly speaking, the combination of two ideas: one 
is the wallpaper pattern in the context of DNA sequencing '§1 - §7' , and the other is 
the tentative development towards systematization of molecular/genetic biology in the 
style of some category -theory-like description '§8'. Essentially, the two are different topics 
although strongly connected. The former is an independent study on symmetry modeling 
of DNA sequences, whereas the latter can be re-expressed using different material as long 
as the basic elements can be treated within a category-theory-like model satisfying group 
theory-like postulates. 

In this article, we considered a group/category theory-like treatment devising an ex- 
pedient % grid-point array, and group operations to move over the array. We discussed 
whether and how a more synthesized description can be constructed, using simplistic 
postulates of group. Next, we take the basics of category theory to describe processes, 
although this is only at a preliminary stage. 
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For an application of our ideas, we have chosen DNA sequencing from the perspec- 
tive of not only coding sequences of DNA bases but also describing insertions/deletions 
of DNA sequences using a single operation that is an element of a group. The exped- 
itious 'E' permits inserting and/or deleting sequences depending on the purpose. Spe- 
cific notation was introduced so that vector-like DNA sequences and operations can be 
composed as a set and a group. 

Ordinarily, a method to describe DNA sequences is often limited in scope by focusing 
on only one aspect such as recognizing each base sequentially (e.g., A, G, T, A, C.from 
'AGTAC...') [13,33-35], where an operation like 'rotation! 'transition or conversion based 
on a certain solid is often used. Another focus of attention is the rules for interpretation 
of codons in synthesizing proteins from DNA sequences. The rules are defined to capture 
the specific activity from the viewpoint of group-like operations [18-22]. 

It also enables us to treat three manipulations as one type of operation in the group, 
with easily-imaginable graphic displays such as Figure 4, although it is only an accesso- 
rial tool at this stage. Increasing the degree of freedom by one and integrating changes 
of coding and sequence recombination might yield some polysemous utility. 

When inserting/deleting sequences of bases into the main DNA sequence, even if the 
endpoints of the base series are identified precisely, it appears that manipulations via 
'E's are not always necessary. Nonetheless, to determine the final order of the bases in 
these cases, we must track base changes from one to the next (including 'E', even when 
lost or deleted). If we use the 'E'-assisted manipulations for coding, we need only to 
examine the inclusions; the rest remains unchanged in order. Additionally, we assume 
that when any operation is performed, the position and number of 'E s should be fixed 
so that the order of any component of 'Dj' or 'Bj' is not changed, at least, during opera- 
tions (e.g., (A.7)). The exception is specifically the insertion/deletion of 'E's such as in 
(29-34) and (36-45). 

We briefly point out the notational benefits of the imaginary 'E's. These are three; 1) 
to adjust the sorts (number) of bases in DNAs and RNAs (from '4' to '5'), thus enabling 
group-theoretical composition over the (two-dimensional) plane; 2) to link the nota- 
tional sequences of DNAs and mRNAs in a single format that can be used in a more 
compact database to record and analyze genetic information; and 3) to express se- 
quences of DNAs/RNAs as a vector in three different ways: a) with explicit 'E's in the 
vector, b) with implicit 'E's in the vector, and c) with all 'E's omitted in the vector except 
the trailing 'E's. The last offers flexibility in storing world-wide genetic data in a single 
set. We suggest that exhaustiveness is one of the potentialities of the model adding ver- 
satility in addressing the possibilities of certain behaviors of DNA/RNA sequences. 
While that might be far from practical applications at this moment, a more rigorous 
methodology in the near future may yield a means. 

Regarding style of the grid-point/cruciform/wallpaper pattern (Figure 1) in defining the 
group postulates, one of its advantages is that each base is surrounded by the four others. 
This symmetrical simplicity is absent in the linear group and the rotational group (Figures 1, 
5 and 6), where the relative position of the five bases is fixed and thereby restrictive. Also, it 
might be crucial that the number '5' is key in enabling composition of the sort provided by 
the wallpaper group using the cruciform, and an identity element necessary to satisfy the 
group postulates. Being a prime, '5' will be convenient in further developments of the model 
exploiting algebraic structures such as rings or fields. 
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A similar synthesis might be possible between a modulo 7 additive rotational group 
based on a sevenfold phasor diagram with a space group depending upon six 'forward/ 
backward', up/down and left/right' directions. In practice, a space group is formed that 
consists of three orthogonal cruciforms comprising the six directions (±x, ±y, ±z) with 
seven elemental operations {m u (up), m d (down), m r (right), m^left), m f (forward), m b 
(backward) and m n (no movement)}. These determine the operations of the group, 
which permute seven bases (prime number) or seven letter-like constituents. Analogous 
to the fivefold phasor diagram, we draw equispaced elements on the unit circle over 
the Gaussian plane; suppose '(p = 2n/7 (rad)', then the set {(pi(= (p), (p2(= 2(p), cp 3 (= 3cp), 
(p 4 (= 4(p), cp 5 (= 5(p), (p 6 (= 6c|>), (p 0 (= cp 7 = 0)} parameterizes the rotational group [51], and 
both are, at least, in partial correspondence. We presume that, in extension, bringing together 
an n-dimensional space group (using the 2n + 1 elements associated with the ± n-directions 
and E) and a rotational group based on the n-fold phasor diagram on the unit circle 
(with 2n + 1 elements as points of the vertex of a polygon) might be possible. For this 
article, we have just focused on 'n = 2' in §l-§8. 

Apart from the above, the model based on the wallpaper pattern might have a close re- 
lationship with cellular automata [52]. Appropriate definitions of the wallpaper pattern 
for the five bases might find an expression between groups and cellular automata [53]. 

One consideration concerns whether a more integrated/synthesized style to describe 
biomolecular processes is possible using only simple, primitive defining rules, in par- 
ticular, when describing genetic processes such as DNA transcription and RNA synthe- 
sis of proteins. Whereas the group postulates might be too restrictive to define 
molecular behavior, category postulates might enable such schemes to proceed because 
its postulates are weaker than those defining a group. If the interpretation of DNA by 
messenger RNA is definable within category theory, and protein synthesis is expressible 
within the same theory, there might be advantages in having the molecular system clas- 
sified and treated in a reduced size in the database. At least, we conjecture that these 
ideas might be valid when clarifying impossible phenomena associated with changes of 
DNA sequences, resulting in reducing unnecessary, recrementitious efforts or round- 
about paths that might encroach on researchers' limited time for investigation. That 
issue might be avoided if the impossibility of certain themes was known beforehand. 
From this standpoint, we believe that a mathematical systematization (in a general and 
unexceptional manner) is crucially important for future molecular/genetic biology. 

The limitations of the present model should be noted. First, the wallpaper pattern 
drawn in Figure 1 is one example of various patterns. In general, the wallpaper groups 
have been classified into seventeen categories [2,44-47]. There could be other types of 
patterns like Figure 1 and groups upon which to compose this sort of model. For in- 
stance, if we exchange all As for all 'Cs, and all 'Gs for all 'T s in the model presented 
in this article, an almost equivalent model '§l-§8' is constructed. Other arrangements 
might provide still unknown advantages that enable models like ours to be treated in a 
more rational manner. It remains unclear how to construct an optimal method to determine 
models yielding the wallpaper pattern of Figure 1 and the bijection given in Figure 3, and 
to develop the categories presented in Figures 7 and 8. The best positions of the five bases 
should be examined under a rigorous methodology. 

Second, a Cartesian vector is defined as a combination of components on which 
operations are conducted independently. Indeed, we can perform operations on the 
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i-th component of Dj of set D using the (i + l)-th component of Bj by adding an 'E' 
any place before the (i-l)-th component of Dj. This is because the components 
after the i-th of Bj shift to the right within the vector Bj. Therefore, in appearance, 
the components at different positions are essentially spectators (e.g., a base 'C 3 ' 
cannot change into either 'A 5 ' or G 5 by any Bj except via 'E'-assisted manipula- 
tions). In this case, after the insertion of two 'Es between the '2' and '3' compo- 
nents, < C 3+2 (= C 5 y can become either 'A 5 ' or G 5 ' by acting appropriately on Bj at 
C 5 . However, that might raise some confusion. For 'E'-assisted operations (such as 
29-34, 38, 40), the results might change according to the place number of inserted/ 
deleted 'Es that yields the mis-matches between the 'i-th' component of 'Dj' and that 
of 'Bj'. We believe further studies are warranted to find a descriptive format for the 
model. 

Third, as for the graphical displays of 'D's in Figure 4, although the sequences of 
( D m 's (m = 1, 2, 3) are in reality the same, the respective expressions are not always 
unique because the presence of the imaginary 'Es changes the shape of each se- 
quence; e.g./Di = [A 1 C 2 C 3 ( )G 4 T 5 E 6 E 7 E 8 ...;r and 'D 2 = [A 1 C 2 C3(E4E 5 )G 6 T 7 E 8 E 9 E 1 o...r 
are different over the wallpaper pattern despite being equivalent as real sequences. 
Although by use of electronic tools, these graphics might be of versatility for detec- 
tion or identification of DNA sequences, these might produce other confusions in 
the present form. We hope that more appropriate devices would be performed in fu- 
ture study. 

Fourth, DNA transcription to RNA and/or mRNA and translation of RNA and/or 
mRNA into proteins at the ribosomes are performed using a grammar rule based on a 
three-base set called a codon. Codons have information to synthesize twenty types of 
proteins; for example, 'CAG' codes for 'glutamine'. As mentioned before, a number of 
approaches have been proposed exploit group-theoretic methods. These cover the rules 
for composition of triplet of bases 'XXX', the ways of reading codons, and models to 
compose geometric solids such as the tetrahedron and hexahedron, [12,18-21,33,34,50]. 
The rules for treating this aspect (transcription and translation of DNA bases' informa- 
tion) are not established in the present article. In addition, there are specific types of 
codon, such as 'TAAJ 'TGAJ and 'TAG', which are presently classified as stop' or 'halt' 
commands. Aside from this, there are various rules related to biogenetic activities such 
as DNA repair, alternative splicing, transposition, and translocation. These specific char- 
acterizations are lacking in our model, so, further improvements on this issue are 
desirable. 

Fifth, the traditional symmetry model of DNA bases often is based on the chemical 
types purine/pyrimidinel amino /keto', and strong/ weak hydrogen binding' using bio- 
molecular characteristics, which often have advantages for their treatments where 
three-dimensional graphics aid the imagination, and 'matricized' expressions are pos- 
sible [29,35,36]. In our model, we merely use a rule for complementary pairing in §4 
and §5. No restriction on couplings between 'C, A, T, G and E' is postulated in the 
present article. There might occur a number of combinations where non-realistic pairings 
of bases (e.g., 'A-G', 'C-C, and 'T-E') produce futilities and wastefulness in applications. 
We hope that future studies can solve this problem. 

Sixth, there might be too many speculative conjectures with hypothetical situations 
those should be used to prove scientific facts using verified methods. Thus, a more 
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rigorous examination for a rational style with a more effective methodology is 
necessary. 

Our model is far from a complete systematization. However, we believe that it is ne- 
cessary that some principal breakthrough should be pursued if we intend to systemize 
a descriptive model, and that if appropriate definitions are devised, that might help to 
systemize biomolecular/genetic biology in a more optimized manner with greater so- 
phistication to make a significant contribution to the field. 

Conclusions 

Within the large limitations of our methodology, it is considered that there is fertile 
ground where variants of the symmetry model for genetic coding based upon a specific 
wallpaper group are constructible. By integrating the linear group and rotational group 
over a specific wallpaper pattern, a more integrated formulation based on a group/cat- 
egory theory-like description is open to exploration in applications to a number of 
topics from molecular/genetic biology. 

Appendix A 

According to Figures 1, 3 and 6, the following relationships are confirmed straightfor- 
wardly between any bases and independently of the type of bases: 



d#d = ltu = 
r«r = d«l = 
1#1 = r«u = 
u«u = rmd 



= u#l = 
Ud = 
u#r = 
= d«r 



- r«n - 
u«n = 
d«n = 
= l#n 



= n«r = r, 
n«u = u, 
n«d = d, 

= n#l = 1, 



(A.l) 



n«n 



r#l = Ur = u«d = d«u 



n 



Here the symbol '<->' signifies 'bijection and the meaning of x[-l, 0]' is explained in 
§4. Hence, operators that are regarded to effect changes from one base to another can be 
re-expressed as illustrated in the following examples for various types of component 
operations: 





= b[c^A] 


= b[ A ^T] 


= b[ T ^ G ] 


= b[ G ^E] 


= r (<->6>i) 


= *[1,0], 


(A.2) 




= b [A ^ G] 


= &[G-C] 


= b[ C ^T] 


= b[ T ^ E ] 


= U (<r+G) 2 ) 


= x[0, 1], 


(A.3) 


b[E^T] 


= b[ T ^ C ] 


= b[ C ^G] 


= b [G ^A] 


= b[A—>E] 


= d (<-H0 3 ) 


= *[0,-l], 


(A.4) 


b[E^G] 


= b[ G ^T] 


= b[ T ^A] 


= b [A ^ C ] 


= b[ C ^E] 


= / (<->Cl) 4 ) 


= *[-l,0], 


(A.5) 




= b{ C ^c] - 


= b [A—>A] = 


= b[ T ^ T ] = 


= b[ G ^ G ] = 


= n (<-h» 0 (= 


no rotation) — 0)5) 


= x[0, 0] 



(A.6) 



Appendix B 

As for B, 

1) Associativity: '(Bj «Bi = Bj«(B k «Bi)' holds for all positive integers j, k and 1. 

2) Identity: *Bo = [ni|n 2 |n 3 | ... ... |n (n . 1) |n n |n n + 1 |n n + 2 |n n + 3 | ...]' is an identity 

element that satisfies ' B 0 «B m = B m #B 0 = B m \ (i = 1, 2, 3, ' n t (=n) ' is an element 
of Z 5 (no movement of the point P)) 
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3) Inverses: there exists a unique ( B m b that satisfies 'B^tBm = B^B^ = B 0 '. 
Actually, the components of the inverse are the inverses of each individual 
component. 

4) Commutativity: 'BfB k = B k #B/. 

5) Closure law: any 'B^B^ belongs to the set B. 

Appendix C 

D2°B( 2 ^3) #B (3^4) 

= [A 1 |C 2 |C 3 (E 4 |E 5 )G 6 |T 7 |A N |E 8 |E 9 |...] 

°[ni|n2|n3(u4|d 5 )n 6 |n 7 |n 8 |n9|...]«[li|u2| d 3 (r4|d5)n 6 |l 7 |n 8 |n9|...], 
= [Ai°ni«li|C 2 °n2«U2|C3°n3«d 3 ( 

(A.7) 

Again, with reference to Figure 1 or Appendix B, 

= [Ai°li|C2°U2|C3°d3(E4°d4|E5<>r5)G6 o n 6 |T7°l7|E 8 °n 8 |E9°n9|...], 
= [C 1 |T 2 |G 3 (T 4 |C 5 )G 6 |A 7 |E 8 |E 9 |...] =D 4 . 



Appendix D 

Naturally, the series D k is generated through the following sequence of operations: 

B 0 _ k) = [r 4 " 1 x | r 3 " 2 2 1 r 1 -^ | . . . | r^S | . . . | r 1 " 3 ^ x | r 3 " 2 N | r 0 " ° N+1 1 r°-° N+2 |r°-° N+3 1 . . -] , 
= [r 3 1 |r 1 2 |r 1 3 |...|r 3 i |...|r-Vi|r 1 N |r 0 N + i|r 0 N + 2|r°N + 3|...], 
= [r 3 1 |r 1 2 |r 1 3|...|r 3 i |...|r- 2 + 5 N _i|r 1 N|r 0 N +1 |r° N+ 2|r 0 N + 3|...], 
= [r 3 1 |r 1 2 |r 1 3 |...|r 3 i |...|rV 1 |r 1 N |r 0 N + i|r 0 N + 2|r 0 N + 3|...]. 



Then, 



D j° B (Hk) 



[C 1 |A 2 |E 3 |...|C i |...|T N _ 1 |A N |E N+1 |E N+2 |E N+3 |...]4r 3 1 |r 1 2 |r 1 3|...|r 3 i |... 



...|r 3 N-l|r 1 N|r°N+l|l 0 N + 2|t 0 N+3|...], 



[E.r\|E.r 2 2 |E.r° 3 |...|E.r 1 i |...|E.rVi|E-r 2 N |E.rVi|E'rV2|E'r () N + 3|...]-[r 3 i|: 

= [E-r 1+3 1 |E.r 2+1 2 |ET 0 + 1 3 |...|ET 1 + 3 i |... 



IrM... 



..|r 3 N _ 1 |r 1 N |A + i|A+ 2 |A + 3 



...|E.r 3 + 3 N _ 1 |E.r 2 +V|E.r° N+ i|E'r'' N+ 2|E"rV3|...],= [E'r 1 + 3 1 |E.r 2 + 1 2 |E.r 0 

...|E.r 3 +V 1 |E.r 3 N |E.r 0 N+1 |E.r 0 N+2 |E.r° N+ 3l-], 
= [E.r^|E.r 3 2 |E.r 1 3 |...|E.r^|...|E.rVl|E•r 3 N |E.rVl|E•rV2|E•rV3|...], 
= [E.r 4 1 |E»r 3 2 |E.r 1 3 |...|E.r 4 i |...|E.r 6 - 5 N _i|E.r 3 N |E.r 0 N+1 |E.r 0 N+2 |E.r 0 N+3 |...; 
= [E.r 4 1 |E.r 3 2 |E.r 1 3 |...|E.r 4 i |...|E.r 1 N _ 1 |E.r 3 N |E.r 0 N+ i|E-r 0 N+ 2|E.r° N+ 3|...], 
= [G 1 |T 2 |Q|...|G i |...|C N _ 1 |T N |E N+1 |E N+2 |E N+3 |...] = D k . 



...lE-r^M. 



(A.9) 
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Appendix E 

D r B (Hjt) 

= [E-x[0, l]i|E-x[0,-l] 2 |E-x[0,0] 3 |...|E-x[l,0]i|... 
...|E.x[-l,0] N _ 1 |E.x[0, l] N |E.x[0,0] N+1 |E-x[0,0] N+2 |E.x[0,0] N+3 |...] 
"[x[0,-2]i|x[0,2] 2 |x[0,0] 3 |...|x[-2,0]i|...|x[2,0] N _i|x[0,-2] N |x^ 
- [E°rW 1 |E°rW 1 2 |E°rW 3 |...|E°rW i |... 

...IE^.uViIE^.uVIE^.uViIE^.uV^IE^.uVsI...], 

*[rW\|rW 2 |rW 3 |...|r- 2 ^ 

= [Eor^u^r^u^ilEor^u-^r^^slEor^u^r^u^l.-.lEor^u^r^.u 0 !!... 
|E°r-W«r 2 «uVi|E°rW«r°«u- 2 N |E^^ 

= [E°rW\|E°rW 2 |E°rW 3 |...|E°r-Wi|... 

...lE^.uVilE^.u-VlE-r^uVilE^.uV^lE^.uVsl...], 
= [E°r°«d^|E°rW 2 |E°rW 3 |...|E4 1 .u 0 i |... 
...|E 0 r 1 «uVi|E°r 0 «d 1 N |E 0 r 0 .u 0 N+1 |E 0 r 0 .u 0 N+2 |E 0 r 0 .u 0 N+3 |...], 

= [T 1 |A 2 |E 3 |...|G i |...|C n _ 1 |T n |E n+1 |E n+2 |E n+3 |...] 
= Dj +. 

(A.10) 

Appendix F 

The axioms are: 

I) A binary operation and closure law: the combination of two morphisms satisfies 
hom(X, X) x hom(X, Y) -> hom(X, Y). Moreover, hom(X, Y) x hom(Y, Z) -> mor 
(X, Z) and hom(Y, Z) x hom(Z, Zs) — > mor (Y, Zs) both hold. 

II) Associativity: If f: X -> X, p: X Y, t: Y -> Z, g: X Z, h: Z Z, and j: Z -> Zs. 
Then, T#(p#T) = (f#p)#Tj < p«(x«h) = (p«T)«h; 'f^g^h) = (f«g)«h; and T«(h«j) = 
(T^h)*]' hold. 

III) Identity: there exist morphisms 'lx, 1 Y , lz> Izs' such that '±x # f = f = f # lx! an d 

i y #p = p = p#i x ; 1z # t = t = t#i y ; i z #g = g = g#i x ; i z #h = h = h#i z \ i Zs #j = j = 

j*l z . In practice, 

'l x = ly = lz = Izs — [ni | | n 3 1 | n i | | n N— 1 1 n ivr I ] ' satisfies these conditions. 

(A.ll) 

Appendix G 

For Category Q, 



morphism fi(= Bi(e group Bi)): Xi— >Xi, 
morphism p x : Xi— >Yi, 
morphism Ti: Yi— >Zi, 
morphism g : (= p^Ti) : Xi— >Zi, 
morphism hi(= Bi(e group Bi)): Zi— >Zi, 
morphism j x : Zi— >Zsi. 



(A.12) 
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Similarly for category C 2 , for each object F^) = X 2 , F(Yi) = Y 2 , F(Zi) = Z 2 , F(Zs!) = 
Zs 2 (eC 2 ), the following relationships also hold: 

morphism F(f x ) (= f 2 = B 2 (g group B 2 )): F(X 1 )^F(X 1 ), 
morphismF(p 1 ) (= p 2 ): F(Xi)-»F(Yi), 
morphism F(n) (= t 2 ): F(Y 1 )^F(Z 1 ), 
morphism F(g 1 ) (= g 2 = p 2 *T 2 ): F(Xi)->F(Zi), 
morphism F(hi) (= h 2 = B 2 (g group B 2 )): F(Zi)— >F(Zi), 
morphism F(j x ) (= j 2 ): F(Zi)— >F(Zsi). 

Other than these, if relationships F(f 1 «p 1 ) = F(f 1 )«F(p 1 ), F(pi«Ti) = F(p 1 )#F(T 1 ), F 
(T 1 #h 1 ) = F(x 1 )#F(h 1 ), F(f 1 #g 1 ) = F(f 1 )#F(g 1 ), F(g 1 #h 1 ) = F(g 1 )#F(h 1 ), and F(h 1 #j 1 ) = F 
(h 1 )«F(j 1 ) are satisfied, the composition of d and C 2 linked with 'functor F' is possible 
although the proof is omitted here. 

Furthermore, the following postulates hold: for object X (gQ), 'F(1 x ) = 1f(x) ( g C 2 )' is 
true, for object Y (gQ), 'F(ly) = 1f(y) ( g C 2 )' and for object Z (gQ), 'F(1 z ) = 1 F(Z ) (eC 2 )I 
is true under the condition: 

'F(lx) = 1f(x) = F(1y) = 1f(y) = F(lz) = h(z) = [o>o i|"o 2|w 0 3|...|"o i|...|wo n-i|g>o n|...]'- 

(A.14) 
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